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(54) Voice analyzing and synthesizing apparatus and method, and program 



(57) A voice analyzing apparatus comprises: first 
analyzing means for analyzing a voice into hamrionic 
components and inhannonic components: second ana- 
lyzing means for analyzing a magnitude spectrum en- 
velope of the harmonic components into a magnitude 
spectrum envelope of a vocal cord vibration waveform, 
resonances and a spectrum envelope of a difference of 
the magnitude spectrum envelope of the harmonic com- 



ponents from a sum of the magnitude spectrum enve- 
lope of the vocal cord vibration wavefomi and the reso- 
nances; and means for storing the inharmonic compo- 
nents, the magnitude spectrum envelope of the vocal 
cord vibration wavefonn, resonances and the spectrum 
envelope of the difference. 
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Description 

CROSS REFERENCE TO RELATED APPLICATION 

5 [0001 ] This application is based on Japanese Patent Application No. 2001 -067257, filed on March 9, 2001 , the whole 
contents of which are incorporated herein by reference. 

BACKGROUND OF THE INVENTION 

'0 A) FIELD OF THE INVENTION 

[0002] The present invention relates to a voice synthesizing apparatus, and more particularly to a voice synthesizing 
apparatus for synthesizing voices of a song sung by a singer. 

^5 B) DESCRIPTION OF THE RELATED ART 

[0003] Human voices are constituted of phonemes each constituted of a plurality of formants. In synthesizing voices 
of a song sung by a singer, first all formants constituting each of all phonemes capable of being produced by a singer 
are generated and synthesized to form each phoneme. Nexl, a plurality of generated phonemes are sequenlially cou- 
pled and pitches are controlled in accordance with the melody to thereby synthesize voices of a song sung by a singer. 
This method is applicable not only to human voices but also to musical sounds produced by a musical instrument such 
as a wind instrument. 

[0004] A voice synthesizing apparatus utilizing this method is already known. For example, Japanese Patent No 
2504172 discloses a fomriant sound generating apparatus which can generate a formant sound having even a high 
25 pitch Without generating unnecessary spectra. 

[0005] The above-described formant sound generating apparatus and conventional voice synthesizing apparatus 
cannot reproduce Individual characters such as the voice quality, peculiarity and the like of each person if the pitch 
only is changed, although they can pseudonymously synthesize voices of a song sung by a general person. 

30 SUMMARY OF THE INVENTION 

[0006] It is an object of the present invention to provide a voice synthesizing apparatus capable of synthesizing 
voices of a song sung by a singer and reproducing individual characters such as the voice quality peculiaritv and the 
like of each singer. 

35 [0007] It is another object of the present invention to provide a voice synthesizing apparatus capable of synthesizing 
more realistic voices of a song sung by a singer and singing the song in a state without un naturalness. 
[0008] According to one aspect of the present invention, there is provided a voice analyzing apparatus comprising- 
first analyzing means for analyzing a voice into harmonic components and inharmonic components: second analyzing 
means for analyzing a magnitude spectrum envelope of the hamionic components into a magnitude spectrum envelope 
of a vocal cord vibration waveform, resonances and a spectrum envelope of a difference of the magnitude spectrum 
envelope of the harmonic components from a sum of the magnitude spectrum envelope of the vocal cord vibration 
waveform and the resonances; and means for storing the inharmonic components, the magnitude spectrum envelope 
of the vocal cord vibration waveform, resonances and the spectrum envelope of the difference. 
[0009] According to another aspect of the invention, there is provided a voice synthesizing apparatus comprising 
means for storing a magnitude spectrum envelope of a vocal cord vibration waveform, resonances and a spectrum 
envelope of a difference of a magnitude spectrum envelope of a hamionic components from a sum of the magnitude 
spectrum envelope of the vocal cord vibration waveform and the resonances, respectively analyzed from the harmonic 
components analyzed from a voice and inharmonic components analyzed from the voice; means for inputting infor- 
mation of a voice to be synthesized; means for generating a flat magnitude spectrum envelope; and means for adding 
the inharmonic components, the magnitude spectrum envelope of the vocal cord vibration waveform, resonances and 
the spectrum envelope of the difference, respectively read from said means for storing, to the flat magnitude spectrum 
envelope, in accordance with the input information. 

[001 0] According to yet another aspect of the invention/there is provided a voice synthesizing apparatus comprising- 
first analyzing means for analyzing a voice into hannonic components and inharmonic components: second analyzing 
means for analyzing a magnitude spectrum envelope of the harmonic components into a magnitude spectrum envelope 
of a vocal cord vibration waveform, resonances and a spectrum envelope of a difference of the magnitude spectrum 
envelope of the harmonic components from a sum of the magnitude spectrum envelope of the vocal cord vibration 
waveform and the resonances; means for storing the inharmonic components, the magnitude spectrum envelope of 
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the vocal cord vibration waveform, resonances and the spectmm envelope of the difference; means for inputting in- 
formation of a voice to be synthesized; means for generating a flat magnitude spectrum envelope: and means for 
adding the inharmonic components, the magnitude spectrum envelope of the vocal cord vibration waveform, resonanc- 
es and the spectrum envelope of the difference, respectively read from said means for storing, to the flat magnitude 
spectrum envelope, in accordance with the input information. 

[0011] As above, it is possible to provide a voice synthesizing apparatus capable of synthesizing human musical 
sounds and reproducing individual characters such as the voice quality, peculiarity and the like of each person, 
[0012] It is also possible to provide a voice synthesizing apparatus capable of synthesizing more realistic voices of 
a song sung by a singer and singing a song in a state without unnaturalness. 

BRIEF DESCRIPTION OF THE DRAWINGS 

[0013] 



Fig. 1 is a diagram illustrating voice analysis according to an embodiment of the invention. 
Fig. 2 is a graph showing a spectrum envelope of harmonic components. 
Fig. 3 is a graph showing a magnitude spectrum envelope of inhamrionlc components. 
Fig. 4 is a graph showing spectrum envelopes of a vocal cord vibration waveform. 
Fig. 5 is a graph showing a change in Excitation Curve. 

Fig- 6 is a graph showing spectrum envelopes formed by Vocal Tract Resonance. 
Fig. 7 is a graph showing a spectrum envelope of a Chest Resonance waveform. 
Fig. 8 is a graph showing the frequency characteristics of resonances. 
Fig. 9 is a graph showing an example of Spectral Shape Differential. 

Fig. 10 is a graph showing the magnitude spectrum envelope of the harmonic components HC shown in Fig, 2 
25 analyzed Into EpR parameters. 

Figs. 11 A and 11 B are graphs showing examples of the total spectrum envelope when EGain of the Excitation 
Curve shown in Fig. 10 is changed. 

Figs, 12A and 12B are graphs showing examples of the total spectrum envelope when ESIope of the Excitation 
Curve shown in Fig. 10 is changed. 

Figs. 13A and 138 are graphs showing examples of the total spectrum envelope when ESIope Depth of the Exci- 
tation Curve shown in Fig. 10 is changed. 

Figs. 1 4A to 14C are graphs showing a change in EpR with a change in Dynamics. 
Fig, 15 is a graph showing a change in the frequency characteristics when Opening Is changed. 
Fig. 16 is a blocl< diagram of a song-synthesizing engine of a voice synthesizing apparatus. 



DESCRIPTION OF THE PREFERRED EMBODIMENTS 



[0014] Fig. 1 is a diagram illustrating voice analysis. 

[0015] Voices input to a voice Input unit 1 are sent to a voice analysis unit 2. The voice analysis unit 2 analyzes the 
supplied voices every constant period. The voice analysis unit 2 analyzes an input voice into harmonic components 
HC and inharmonic components US, for example, by spectral modeling synthesis (SMS). 

[0016] The harmonic components HC are components that can be represented by a sum of sine waves having some 
frequencies and magnitudes. Dots shown In Fig. 2 Indicate the frequency and magnitude (sine components) of an Input 
voice to be obtained as the hannonic components HC. In this embodiment, a set of straight lines Interconnecting these 
dots is used as a magnitude spectrum envelope. The magnitude spectrum envelope Is shown by a broken line In Fig. 
2. A fundamental frequency Pitch can be obtained at the same time when the harmonic components HC are obtained. 
[0017] The inhamrionic components DC are noise components of the input voice unable to be analyzed as the har- 
monic components HC. The inhannonic components UC are, for example, those shown in Fig. 3. The upper graph in 
Fig. 3 shows a magnitude spectrum representative of the magnitude of the Inharmonic components UC, and the lower 
graph shows a phase spectrum represe rotative of the phase of the inharmonic components UC. In this embodiment, 
the magnitudes and phases of the inharmonic components UC themselves are recorded as frame information FL. 
[0018] The magnitude spectrum envelope of the harmonic components extracted through analysis is analyzed into 
a plurality of excitation plus resonance (EpR) parameters to facilitate later processes. 

[0019] In this embodiment, the EpR parameters include four parameters: an Excitation Curve parameter, a Vocal 
55 Tract Resonance parameter, a Chest Resonance parameter, and a Spectral Shape Differential parameter. Other EpR 
parameters may also be used. 

[0020] As will be later detailed, the Excitation Curve indicates a spectrum envelope of a vocal co rd vibration wavefo nn , 
and the Vocal Tract Resonance is an approximation of the spectrum shape (fomnants) fomned by a vocal tract as a 
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combination of several resonances. The Chest Resonance is an approximation of the formants of low frequencies 
other than the formants of the Vocal Tract Resonance fomned as a combination of several resonances (particularly 
chest resonances). 

[0021] The Spectral Shape Differential represents the components unable to be expressed by the above-described 
three EpR parameters. Namely, The Spectral Shape Differential is obtained by subtracting the Excitation Curve, Vocal 
Tract Resonance and Chest Resonance from the magnitude spectrum envelope. 

[0022] The Inharmonic components UC and EpR parameters are stored in a storage unit 3 as pieces of frame infor- 
mation FL1 to FLn. 

[0023] Fig. 4 is a graph showing the spectrum envelope (Excitation Cun^e) of a vocal code vibration waveform. The 
Excitation Curve corresponds to the magnitude spectrum envelope of a vocal cord vibration waveform. 
[0024] More specifically, the Excitation Curve is constituted of three EpR parameters: an EGain [dB] representative 
of the magnitude of a vocal cord vibration waveform; an ESIope representative of a slope of the spectrum envelope of 
the vocal cord vibration waveform; and an ESIope Depth representative of a depth from the maximum value to minimum 
value of the spectrum envelope of the vocal cord vibration waveform. 

[0025] By using these three EpR parameters, the magnitude spectrum envelope (Excitation Curve Mag dB) of the 
Excitation Curve at a frequency fHz can be given by the following equation: 



ExcitationCurveMag^giff^^) = ^^^i^dB ESIopeDepth^Q • (e -1 ) (a) 

20 

[0026] It can be understood from this equation (a) that EGain can genuinely change the signal magnitude of the 
magnitude spectrum envelope of the Excitation Curve, and ESIope and ESIope Depth can control the frequency char- 
acteristics (slope) of the signal magnitude of the magnitude spectrum envelope of the Excilalion Curve. 
[0027] Fig. 5 Is a graph showing a change in Excitation Curve by the equation (a). The Excitation Curve extends 
starting from EGain [dB] at thefrequency f = 0 Hz along an asymptote of EGain - ESIope Depth [dB]. ESIope detemiines 
the slope of the Excitation Curve. 

[0028] Next, how EGain, ESIope and ESIope Depth are calculated will be described. In extractingthe EpR parameters 
from the magnitude spectrum envelope of the original hamnonlc components HC, first the above-described three EpR 
parameters are calculated. 
30 [0029] For example, EGain, ESIope and ESIope Depth are calculated by the following method. 

[0030] First, the maximum magnitude of the original harmonic components HC at the frequency of 250 Hz or lower 
is set to MAX [dB] and MtN is set to - 100 [dB]. 

[0031] Next, the magnitude and frequency of the l-th sine components of the original harmonic components HC at 
the frequency of 1 0.000 Hz are set to Sin Mag [1 ] [dB] and Sin Freq [i] [Hz], and the number of sine components at 
the frequency of 1 0,000 Hz Is set to N. The averages are calculated from the following equations (b1) and (b2) where 
Sin Freq [0] is the lowest frequency of the sine components: 
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^ iSmFreq[i] - SinFreq[G\) 

XAverage = -^ ... (b1) 

N 

^ (log(SmMag[i] - MJN)) 

YAverage = 

^ ... (b2) 

[0032] By using the equations (b1) and (b2), the following equations are set: 

a = log(M4X-Af/A0 (b3) 
/? = (a - YAverage)i XAverage (b4) 
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^ = e"* (b5) 

S=-/7 (b6) 

AO=A-e^'^'"'''^^''^ (57) 

[0033] By using the equations (b3)to (b7), EGain, ESIopeand ESIope Depth are calculated by the following equations 
(b8). (b9) and (b10): 

EGain = A0 + MIN (bs) 
ESIopeDepth = AO (b9) 

ESIope =B (bio) 

[0034] The EpR parameters of EGain, ESIope and ESIope Depth can be calculated In the manner described above. 
[0035] Fig. 6 is a graph showing a spectrum envelope formed by Vocal Tract Resonance. The Vocal Tract Resonance 
is an approximation of the spectrum shape (formants) formed by a vocal tract as a combination of several resonances. 
[0036] For example, a difference between phonemes such as "a" and "i" produced by a human corresponds to a 
difference of the shapes of mountains of a magnitude spectmm envelope mainly caused by a change in the shape of 
the vocal tract; This mountain is called a formant. An approximation of fomriants can be obtained by using resonances. 
[0037] In the example shown in Fig. 6. formanls are approximated by using eleven resonances. The i-th resonance 
is represented by Resonance [i] and the magnitude of the i-th resonance at a frequency f is represented by Resonance 
[i] Mag (f). The magnitude spectrum envelope of Vocal Tract Resonance can be given by the following equation (c1): 
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VocalTracfResonannceMag^^(f„J = TodB(^Resonance[i]Mag,,„^^^(/^^)) ... (c1) 

[0038] By representing the phase of the i-th resonance by Resonance [i] Phase [f], the phase (phase spectmm) of 
Vocal Tract Resonance can be given by the following equation (c2): 

VocalTractRcsonanncePhase(f„^) = ^Resonanc€[i]Phase(f„^ ),.. (c2) 

f 

[0039] Each Resonance [i] can be expressed by three EpR parameters: a center frequency F. a bandwidth Bw and 
an amplitude Amp. How a resonance is calculated will be later described. 

[0040] Fig. 7 is a graph showing a spectrum envelope (Chest Resonance) of a chest resonance waveform. Chest 
Resonance is formed by a chest resonance and expressed by mountains (fomnants) of the magnitude spectrum en- 
velope at low frequencies unable to be represented by Vocal Tract Resonance, the mountains (formants) being formed 
by using resonances. 

[0041] The i-th resonance of chest resonances is represented by CResonance [i] and the magnitude of the i-th 
resonance at a frequency f is represented by CResonance [i] Mag (f). The magnitude spectrum envelope of Chest 
Resonance can be given by the following equation (d): 

ChestRGSonanceMag^g(f„^)^TodB(J^CRosonance[i]Mag,,^^^^^ ... (d) 
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[0042] Each CResonance [i] can be expressed by three EpR parameters: a center frequency F. a bandwidth Bw and 
an amplitude Amp. How a resonance Is calculated will be described. 

[0043] Each resonance (Resonance [i], CResonance [i] of Vocal Tract Resonance and Chest Resonance) can be 
defined by three EpR parameters: the central frequency F. bandwidth Bw and amplitude Amp. 
5 [0044] The transfer function of a z-area of a resonance having the central frequency F and band width Bw can be 
expressed by the following equation (e1): 

(e1) 

where: 

15 z=ef (e2) 

T = Sampitngpehod (©3) 
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^ = -® (e4) 



B = 2e^''^cos{2nfT) (es) 



At=1~B-C (e6) 

3Q [0045] This frequency response can be expressed by the following equation (e7): 

Y(f) - ^'B-C 

1-Bcos(27irr)-Ccos(47i:/T)+y[Bsln(27rfr)-i-Csin(47c/T)] 

35 [0046] Fig. 8 is a graph showing examples of the frequency characteristics of resonances. In these examples, the 
resonance center frequency F was 1500 Hz, and the bandwidth Bw and amplitude Amp were changed. 
[0047] As shown in Fig. 8, the amplitude IT(f)l becomes maximum at a frequency f = the central frequency F This 
maximum value is the resonance amplitude Amp. The Resonance (f) (linear value) of a resonance having the central 
frequency F. band width Bw and amplitude Amp (linear value) represented by the equation (e7) can be given by the 
following equation (e8): 

Resonance{f„^) = Uf^^) (e8) 

[0048] The magnitude of resonance at the frequency f can therefore be given by the following equation (e9) and the 
phase can be given by the following equation (e10): 

RescnanceMap,^^g^(/^^) = |Resona/ice(A^P| (e9) 

Re sonancePhasei fnz) = ^ sonance{ f^) (e 1 o) 

55 [0049] Fig. 9 shows an example of Spectral Shape Differential. Spectral Shape Differential corresponds to the com- 
ponents of the magnitude spectrum envelope of the original input voice unable to be expressed by Excitation Cun/e 
Vocal Tract Resonance and Chest Resonance. 

[0050] By representing these components by Spectral Shape Differential Mag (f) [dB], the following equation (f) is 
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OrgMag^si/nz) = ExcitationCurveMag^s{f„^) + ChestKcsonanc€Mag^sif^^^ 
'^VocalTractRQsonanceMag^g(f„^)'\'SpectralShapeDiff^^ 

[0051] Namely, Spectral Shape Differential is a difference between the other EpR parameters and the original har- 
monic components, this difference being calculated at a constant frequency interval. For example, the difference is 
calculated at a 50 Hz inten/al and a straight-line interpolation is performed between adjacent points. 
[0052] The magnitude spectrum envelope of the hannonic components of the original input voice can be reproduced 
from the equation (f) by using the EpR parameters. 

[0053] Approximately the same original input voice can be recovered by adding the inhannonic components to the 
^5 magnitude spectmm envelope of the reproduced harmonic components. 

[0054] Fig. 1 0 is a graph showing the magnitude spectmm envelope of the harmonic components HC shown In Fig. 
2 analyzed Into EpR parameters. 

[0055] Fig. 10 shows: Vocal Tract Resonance corresponding to the resonances having the center frequency higher 
than Ihe second mountain shown in Fig, 6; Chest Resonance corresponding lo the resonance having the lowest center 
frequency shown in Fig. 7; Spectral Shape Differential indicated by a dotted line shown in Fig, 9; and Excitation Curve 
indicated by a bold broken line. 

[0056] The resonances con-esponding to Vocal Tract Resonance and Chest Resonance are added to Excitation 
Curve. Spectral Shape Differential has a difference value of 0 on Excitation Curve. 

[0057] Next, how the whole spectrum envelope changes it Excitation Curve Is changed will be described. 
[0058] Figs. 1 1 A and 1 1 B show examples of the whole spectrum envelope when EGain of Excitation Curve shown 
in Fig. 10 is changed. 

[0059] As shown in Fig. 1 1 A, as EGain is made large, the gain (magnitude) of the whole spectrum envelope becomes 
large. However, since the shape of the spectrum envelope does not change, the tone color is not changed. Only the 
volume can therefore be made large, 

[0060] As shown in Fig. 11 B, as EGain is made small, the gain (magnitude) of the whole spectrum envelope becomes 
small. However, since the shape of the spectrum envelope does not change, the tone color is not changed. Only the 
volume can therefore be made small. 

[0061] Figs. 1 2A and 1 2B show examples of the whole spectrum envelope when ESIope of Excitation Curve shown 
in Fig. 10 is changed. 

[0062] As shown in Fig, 12A, as ESIope is made large, although the gain (magnitude) of the whole spectrum envelope 
does not change, the shape of the spectrum envelope changes so that the tone color changes. By setting ESIope 
large, the unclear tone color with a suppressed high frequency range can be obtained. 
[0063] As shown in Fig. 1 2B, as ESIope is made small, although the gain (magnitude) of the whole spectrum envelope 
does not change, the shape of the spectrum envelope changes so that the tone color changes. By setting ESIope 
^0 small, the bright tone color with an enhanced high frequency range can be obtained, 

[0064] Figs. ISA and 13B show examples of the whole spectrum envelope when ESIope Depth of Excitation Curve 
shown in Fig. 10 Is changed. 

[0065] As shown in Fig. 1 3A, as ESIope Depth is made large, although the gain (magnitude) of the whole spectrum 
envelope does not change, the shape of the spectrum envelope changes so that the tone color changes. By setting 
ESIope Depth large, the unclear tone color with a suppressed high frequency range can be obtained. 
[0066] As shown In Fig. 13B, as ESIope Depth Is made small, although the gain (magnitude) of the whole spectrum 
envelope does not change, the shape of the spectrum envelope changes so that the tone color changes. By setting 
ESIope Depth small, the bright tone color with an enhanced high frequency range can be obtained. 
[0067] The effects of changing ESIope and ESIope Depth are very similar. 

[0068] Next, a method of simulating a change in tone color of real voice when EpR parameters are changed will be 
described. For example, assuming that one-frame phoneme data of a voiced sound such as "a" is represented by the 
EpR parameters and Dynamics (the volume of voice production), a change In tone color to be changed by Dynamics 
of real voice production is simulated by changing EpR parameters. Generally, voice production at a small volume 
suppresses high frequency components, and the larger the volume becomes, the more the high frequency components 
55 increase, although this changes from one voice producer to another. 

[0069] Figs. 14A to 14C are graphs showing a change in EpR parameters as Dynamics is changed. Fig. 14A shows 

a change in EGain, Fig. 14B shows a change in ESIope, and Fig. 14C shows a change in ESIope Depth. 

[0070] The abscissa in Figs. 14A to 14C represents a value of Dynamics from 0 to 1.0. The Dynamics value 0 



20 



25 



30 



35 



45 



50 



7 



EP 1 239 463 A2 



represents the smallest voice production, the Dynamics value 1 .0 represents the largest voice production, and the 
Dynamics value 0.5 represents a nomnal voice production. 

[0071] A database Timbre DB to be described later stores EGain, ESIope and ESIope Depth for the normal voice 
production, these EpR parameters being changed in accordance with the functions shown in Figs. 14A to 14C. More 
specifically, the function shown In Fig. 14A is represented by FEGain (Dynamics), the function shown In Fig, 14B is 
represented by FESIope (Dynamics), and the function shown in Fig. 1 4C is represented by FESIope Depth (Dynamics). 
If a Dynamics parameter is given, the parameters can be expressed by the following equations (g1) to (g3): 

NewEGain^Q^:^ FEGain ^^(Dynamics) (g1) 



NewEslope^ OriginalESIope * FEStope( Dynamics) (g2) 

15 

NewESIopeDepth^Q = OriginalESIopeDepth^Q + FESJopeDepth^Q{ Dynamics) (g3) 
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where Original ESIope and Original ESIope Depth are the original EpR parameters stored In the database Timbre DB. 
[0072] The functions shown In Figs. 14A to 14C are obtained by analyzing the parameters of the same phoneme 
reproduced at various degrees of voice production (Dynamics). By using these functions, the EpR parameters are 
changed in accordance with Dynamics. It can be considered that the changes shown in Figs. 14A to 14C may differ 
for each phoneme, each voice producer and the like. Therefore, by making the function for each phoneme and each 
voice prodLJcer, a change analogous to more realistic voice production can be obtained. 

[0073] Next, with reference lo Fig. 15, a method of reproducing a change In lone color when Opening of a mouth is 
changed for the voice production of the same phoneme will be described. 

[0074] Fig. 15 is a graph showing a change in frequency characteristics when Opening is changed. Similar to Dy- 
namics, the Opening parameter is assumed to take values from 0 to 1 .0. 

[0075] The Opening value 0 represents the smallest opening of a mouse (low opening), the Opening value 1 .0 rep- 
resents the largest opening of a mouth (high opening), and the Opening value 0.5 represents a nornial opening of a 
mouth (normal opening). 

[0076] The database Timbre DB to be described later stores EpR parameters obtained when a voice Is produced at 
the nomrial mouse opening. The EpR parameters are changed so that they have the frequency characteristics shown 
in Fig. 1 5 at the desired mouse opening degree. 

[0077] In order to realize this change, the amplitude (EpR parameter) of each resonance Is changed as shown in 
Fig. 15. For example, the frequency characteristics are not changed when a voice is produced at the normal mouth 
opening degree (normal opening). When a voice is produced at the smallest mouth opening degree (low opening), the 
amplitudes of the components at 1 to 5 KHz are lowered. When a voice is produced at the largest mouth opening 
degree (high opening), the amplitudes of the components at 1 to 5 KHz are raised. 

[0078] This change function Is represented by FOpenIng (f). The EpR parameters can be changed so that they have 
the frequency characteristics at the desired mouse opening degree, i.e.. the frequency characteristics such as shown 
in Fig. 15, by changing the anhplitude of each resonance by the following equation (h): 

NewRcsortance[i]Amp^^ = Original Kesonance\i]Amp^g 

+ FOpemng^{OnginaIKesonance[i]Freg^J^{0.S-'Opemrig)/0.5 



[0079] The function FOpenIng (f) is obtained by analyzing the parameters of the same phoneme produced at various 
mouth opening degrees. By using this function, the EpR parameters are changed in accordance with the Opening 
values. It can be considered that this change may differ for each phoneme, each voice producer and the like. Therefore, 
by making the function for each phoneme and each voice producer, a change analogous to more realistic voice pro- 
duction can be obtained. 

[0080] The equation (h) con-esponds to the l-th resonance. Original Resonance [I] Amp and Original Resonance [I] 
Freq represent respectively the amplitude and center frequency (EpR parameters) of the resonance stored in the 
database Timbre DB. New Resonance [i] Amp represents the amplitude of a new resonance. 
[0081] Next, how a song is synthesized will be described with reference to Fig. 16. 

[0082] Fig. 16 is a block diagram of a song-synthesizing engine of a voice synthesizing apparatus. The song-syn- 
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thesizing engine has at least an input unit 4, a pulse generator unit 5, a windowing & FFT unit 6, a database 7. a 
plurality of adder units 8a to 8g and an I FFT & overlap unit 9. 

[0083] The input unit 4 is input with a pitch, a voice intensity, a phoneme and other Information in accordance with 
a melody of a song sung by a singer, at each frame period, for example, 5 ms. The other infonnatlon is, for example, 
5 vibrato information including vibrato speed and depth. Information input to the input unit 4 is branched to two series to 
be sent to the pulse generator unit 5 and database 7. 

[0084] The pulse generator unit 5 generates, on the time axis, pulses having a pitch interval corresponding to a pitch 
input from the input unit 4. By changing the gain and pitch interval of the generated pulses to provide the generated 
pulses themselves with a fluctuation of the gain and pitch interval, so called harsh voices and the like can be produced. 

^0 [0085] If the present frame is a voiceless sound, there is no pitch so that the process by the pulse generator unit 5 
is not necessary. The process by the pulse generator unit 5 is perfonned only when a voiced sound is produced. 
[0086] The windowing & FFT unit 6 windows a pulse (time wavefomri) generated by the pulse generator unit 5 and 
then performs fast Fourier transform to convert the pulse into frequency range infomiation. A magnitude spectrum of 
the converted frequency range infonnation is flat over the whole range. An output from the windowing & FFT unit 6 is 

15 separated Into the phase spectrum and magnitude spectrum. 

[0087] The database 7 prepares several databases to be used for synthesizing voices of a song. In this embodiment, 
the database 7 prepares Timbre DB, Stationary DB, Articulation DB, Note DB and Vibrato DB. 
[0088] In accordance with the information Input to the Input unit 4, the database 7 reads necessary databases to 
calculate EpR parameters and InhanTionic components necessary for synthesis at some timings. Timbre DB stores 

20 typical EpR parameters of one frame for each phoneme of a voiced sound (vowel, nasal sound, voiced consonant). It 
also stores EpR parameters of one frame of the same phoneme con-esponding to each of a plurality of pitches. By 
using these pitches and interpolation. EpR parameters corresponding to a desired pitch can be obtained. 
[0089] Stationary DB stores stable analysis frames of several seconds for each phoneme produced in a prolonged 
manner, as well as the harmonic components (EpR parameters) and inharmonic components. For example, assuming 

25 that the frame interval is 5 ms and the stable sound production time is 1 sec, then Stationary DB stores information of 
200 frames for each phoneme. 

[0090] Since Stationary DB stores EpR parameters obtained through analysis of an original voice, it has information 
such as fine fluctuation of the original voice. By using this infonnation, fine change can be given to EpR parameters 
obtained from Timbre DB. It is therefore possible to reproduce the natural pitch, gain, resonance and the like of the 

30 original voice. By adding inharmonic components, more natural synthesized voices can be realized. 

[0091] . Articulation stores an analyzed change part from one phoneme to another phoneme as well as the harmonic 
components (EpR parameters) and inharmonic components. When a voice changing from one phoneme to another 
phoneme Is synthesized, Articulation is referred to and a change in EpR parameters and the inhannonic components 
is used for this changing part to reproduce a natural phoneme change. 

35 [0092] Note DB is constituted of three databases, Attack DB, Reiease DB and Note Transition DB. They store infor- 
mation of a change in gain (EGain) and pitch and other infonnation obtained through analysis of an original voice (real 
voice), respectively for a sound production start part, a sound release part, and a note transftion part. 
[0093] For example, if a change in gain (EGain) and pitch stored in Attack DB Is added to EpR parameters for the 
sound production start part, the change in gain and pitch like natural real voice can be added to the synthesized voice. 

40 [0094] Vibrato DB stores information of a change in gain (EGain) and pitch and other infonnation obtained through 
analysis of a vibrato part of the original voice (real voice). 

[0095] For example, if there is a vibrato part to be given to a voice to be synthesized, EpR parameters of the vibrato 
part are added with a change in gain (EGain) and pitch stored in Vibrato DB so that a natural change In gain and pitch 
can be added to the synthesized voice. Namely, natural vibrato can be reproduced. 
^5 [0096] , Although this embodiment prepares five databases, synthesis of voices of a song can be performed basically 
by using at least Timbre DB. Stationary DB and Articulation DB if the Information of voices of a song and pitches, voice 
volumes and mouth opening degrees Is given. 

[0097] Voices of a song rich in expression can be synthesized by using additional two databases Note DB and Vibrato 
DB. Databases to be added are not limited only to Note DB and Vibrato DB, but any database for voice expression 
so may be used. 

[0098] The database 7 outputs the EpR parameters of Excitation Curve EC, Chest Resonance CR, Vocal Tract 
Resonance VTR, and Spectral Shape Differential SSD calculated by using the above-described databases, as well as 
the inharmonic components DC. 

[0099] As the inhannonic components UC, the database 7 outputs the magnitude spectrum and phase spectrum 
55 such as shown in Fig. 3. The inharmonic components US representnoisecomponentsof a voiced sound of the original 
voice unable to be expressed as hannonlc components, and an unvoiced sound inherently unable to be expressed as 
harmonic components, 

[0100] As shown In Fig. 1 6, Vocal Tract Resonance VTR and inhannonic components are output divisionally for the 
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phase and magnitude. 

[01 01 ] The adder unit 8a adds Excitation Curve EC to the flat magnitude spectrum output from the windowing & FFT 
unit 6. Namely, the magnitude at each frequency calculated by the equation (a) by using EGain. ESIope and ESIope 
Depth is added. The addition result is sent to the adder unit 8b at the succeeding stage. 

[0102] The obtained magnitude spectrum is a magnitude spectrum envelope (Excitation Curve) of a vocal tract vi- 
bration waveform such as shown in Fig. 4. 

[0103] By changing EGain, ESIope and ESIope Depth in accordance with the functions shown in Figs. 14A to 14C 
by using the Dynamics parameters, a change in tone colorto be caused by a change in voice volume can be expressed 
[0104] If the voice volume is desired to be changed, EGain is changed as shown in Figs. 11 A and 11 B. If the tone 
color is desired to be changed, ESIope is changed as shown in Figs. 12A and 12B. 

[0105] The adder unit 8b adds Chest Resonance CR obtained by the equation (d) to the magnitude spectrum added 
with Excitation Curve EC at the adder unit Ba, to thereby obtain the magnitude spectra added with the mountain of the 
magnitude spectrum of chest resonance such as shown in Fig. 7. The obtained magnitude spectrum is sent to the 
adder unit 8c at the succeeding stage. 

[0106] By making the magnitude of Chest Resonance CR large, It is possible to change the chest resonance sound 
larger than the original voice quality. By lowering the frequency of Chest Resonance CR, it Is possible to change the 
voice to the voice having a lower chest resonance sound. 

[01 07] The adder unit 8c adds Vocal Tract Resonance VTR obtained by the equation (c1 ) to the magnitude spectrum 
added with Chesl Resonance CR at the adder unit 8b, to thereby obtain the magnitude spectra added with the mountain 
of the magnitude spectrum of vocal tract such as shown in Fig. 6. The obtained magnitude spectrum is sent to the 
adder unit 8e at the succeeding stage. 

[0108] By adding Vocal Tract Resonance VTR, it is basically possible to express a difference between color tones 
to be caused by a difference between phonemes such as "a" and "1". 

[0109] By changing the amplitude of each resonance in accordance with the Opening parameter described with Fig 
1 5 by using the frequency function, a change in tone color by a mouth opening degree can be reproduced. 
[01 10] By changing the frequency, magnitude, and bandwidth of each resonance, the sound quality can be changed 
to the sound quality different from the original sound quality (for example, to the sound quality of opera). By changing 
the pitch, male voices can be changed to female voices or vice versa, 

[01 1 1 ] The adder unit Bd adds Vocal Tract Resonance VTR obtained by the equation (c2) to the flat phase spectrum 
output from the windowing & FFT unit 6. The obtained phase spectrum is sent to the adder unit 8g. 
[01 12] The adder unit Be adds Spectral Shape Differential Mag dB (f Hz) to the magnitude spectrum added with Vocal 
Tract Resonance VTR at the adder unit 8c to obtain a more precise magnitude spectrum. 

[01 13] The adder unit 8f adds together the magnitude spectrum of the inharmonic components UC supplied from 
the database 7 and the magnitude spectrum sent from the adder unit Be. The added magnitude spectrum is sent to 
the IFFT & overlap adder unit 9 at the succeeding stage. 

[01 14] The adder unit 8g adds together the phase spectrum of the inharmonic components supplied from the data- 
base 7 and the phase spectrum supplied from the adder unit 8d. The added phase spectrum Is sent to the IFFT & 
overlap adder unit 9. 

[0115] The IFFT & overlap adder unit 9 performs Inverse fast Fourier transform (IFFT) of the supplied magnitude 
spectrum and phase spectrum, and overlap-adds together the transformed time waveforms to generate final synthe- 
sized voices. J 

[01 1 6] According to the embodiment, a voice Is analyzed Into hamnonic components and inharmonic components 
The analyzed harmonic components can be analyzed into the magnitude spectrum envelope and a plurality of reso- 
nances respectively of a vocal cord wavefomi, and a difference between these envelopes and resonances and the 
45 original voice, which are stored. 

[01 1 7] According to the embodiment, the magnitude spectrum envelope of a vocal cord waveform can be represented 
by three EpR parameters EGain, ESIope and ESIope Depth. 

[0118] According to the embodiment, by changing the EpR parameter corresponding to a change in voice volume 
m accordance with a prepared function, voice given a natural tone color change caused by a change in voice volume 
50 can be synthesized. 

[0119] According to the embodiment, by changing the EpR parameter corresponding to a change in mouth opening 
degree in accordance with a prepared function, voice given a natural tone color change caused by a change in mouth 
opening degree can be synthesized. 

[0120] Since the functions can be changed with each phoneme and each voice producer, voice can be synthesized 
by taking into consideration an individual characteristic difference between tone color changes caused bv Phonemes 
and voice producers. 

[0121] Although the embodiment has been described mainly with reference to synthesis of voices of a song sung 
by a singer, the embodiment is not limited only thereto, but general speech sounds and musical instrument sounds 
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can also be synthesized in a similar manner. 

[0122] The embodiment may be realized by a computer or the like Installed with a computer program and the like 
realizing the embodiment functions. 

[0123] In this case, the computer program and the like realizing the embodiment functions may be stored in a com- 
puter readable storage medium such as a CD-ROIVI and a floppy disc to distribute it to a user. 

[0124] If the computer and the like are connected to the communication network such as a LAN. the Internet and a 
telephone line, the computer program, data and the like may be supplied via the communication network. 
[0125] The present invention has been described In connection with the prefen-ed embodiments. The Invention is 
not limited only to the above embodiments. It Is apparent that various modifications, improvements, combinations, and 
the like can be made by those skilled In the art. 
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Claims 

1 . A voice analyzing apparatus comprising: 

first analyzing means for analyzing a voice Into harmonic components and Inharmonic components: 

second analyzing means for analyzing a magnitude spectrum envelope of the harmonic components into 
a magnitude spectrum envelope of a vocal cord vibration waveform, resonances and a spectrum envelope 
of a difference of the magnitude spectrum envelope of the harmonic components from a sum of the mag- 
nitude spectrum envelope of the vocal cord vibration waveform and the resonances; and 
means for storing the inhamrionlc components, the magnitude spectrum envelope of the vocal cord vibra- 
tion waveform, resonances and the spectrum envelope of the difference. 

2. A voice analyzing apparatus according to claim 1 , wherein: 

the magnitude spectrum envelope of the vocal cord vibration waveform is represented by three parameters 
EGaIn, ESIope and ESIope Depth; and 
... the three parameters can be expressed by a following equation (1 ): 

ExcitationCutveMag (f) = EGain + ESlopeDepth • (e"^^'*^"^'' -i ) (1) 

where Excitation Cun^e Mag (f) Is the magnitude spectrum envelope of the vocal cord vibration wavefonn, 

3. A voice analyzing apparatus according to claim 1, wherein the resonances include a plurality of resonances ex- 
pressing vocal tract formants and a resonance expressing chest resonance. 

4. A voice synthesizing apparatus comprising: 

means for storing a magnitude spectrum envelope of a vocal cord vibration waveform, resonances and a 
spectrum envelope of a difference of a magnitude spectrum envelope of a hannonic components from a sum 
of the magnitude spectrum envelope of the vocal cord vibration waveform and the resonances, respectively 
analyzed from the harmonic components analyzed from a voice and inharmonic components analyzed from 
the voice; 

means for inputting information of a voice to be synthesized; 
means for generating a flat magnitude spectrum envelope; and 

means for adding the inhannonic components, the magnitude spectrum envelope of the vocal cord vibration 
waveform, resonances and the spectrum envelope of the difference, respectively read from said means for 
storing, to the flat magnitude spectrum envelope, In accordance with the input information. 

5. A voice analyzing apparatus according to claim 4, wherein: 

the magnitude spectrum envelope of the vocal cord vibration waveform Is represented by three parameters 
EGain, ESIope and ESIope Depth; and 

the three parameters can be expressed by a following equation (1 ): 
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ExcitationCurveMag ( f) = EGain + ESIopeDepth • ( e ^^^f^'^ .1 ) ( ^ j 

where Excitation Curve Mag (f) is the magnitude spectrum envelope of the vocal cord vibration waveform. 

6. A voice synthesizing apparatus according to claim 5. wherein said means for storing further stores a function for 
changing the three parameters in accordance with a change in sound volume so that tone color can be changed 
In accordance with the change in sound volume. 

7. A voice analyzing apparatus according to claim 4, wherein the resonances include a plurality of resonances ex- 
pressing vocal tract fomiants and a resonance expressing chest resonance. 

8. A voice synthesizing apparatus according to claim 7. wherein said means for storing further stores a function for 
changing an amplitude of each resonance in accordance with a mouth opening degree so that tone color can be 
changed in accordance with the mouth opening degree. 

9. A voice synthesizing apparatus comprising: 

first analyzing means for analyzing a voice into harmonic components and inharmonic components: 

second analyzing means for analyzing a magnitude spectrum envelope of the harmonic components into 
a magnitude spectrum envelope of a vocal cord vibration waveform, resonances and a spectrum envelope 
of a difference of the magnitude spectrum envelope of the harmonic components from a sum of the mag- 
nitude spectrum envelope of the vocal cord vibration waveform and the resonances; 
means for storing the inharmonic components, the magnitude spectrum envelope of the vocal cord vibra- 
tion waveform, resonances and the spectrum envelope of the difference; 
means for inputting information of a voice to be synthesized; 
means for generating a flat magnitude spectrum envelope; and 

means for adding the inhamionic components, the magnitude spectrum envelope of the vocal cord vibra- 
tion waveform, resonances and the spectrum envelope of the difference, respectively read from said 
means for storing, to the flat magnitude spectrum envelope, in accordance with the input information. 

10. A voice analyzing method comprising, the steps of: 

(a) analyzing a voice into harmonic components and inhamionic components: 

(b) analyzing a magnitude spectrum envelope of the harmonic components into a magnitude spectrum enve- 
lope of a vocal cord vibration waveform, resonances and a spectrum envelope of a difference of the magnitude 
spectrum envelope of the hamionic components from a sum of the magnitude spectrum envelope of the vocal 
cord vibration waveform and the resonances; and 

inharmonic components, the magnitude spectrum envelope of the vocal cord vibration waveform 
resonances and the spectrum envelope of the difference. 
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11. A voice synthesizing method comprising, the steps of: 
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(a) reading a magnitude spectrum envelope of a vocal cord vibration waveform, resonances and a spectrum 
envelope of a difference of a magnitude spectrum envelope of a harmonic components from a sum of the 
magnitude spectrum envelope of the vocal cord vibration waveform and the resonances, respectively analyzed 
from the harmonic components analyzed from a voice and inhamionic components analyzed from the voice- 

(b) inputting infomnation of a voice to be synthesized; 

(c) generating a flat magnitude spectrum envelope; and 

(d) adding the inharmonic components, the magnitude spectrum envelope of the vocal cord vibration wave- 
fonn, resonances and the spectrum envelope of the difference, respectively read at said step (a) to the flat 
magnitude spectrum envelope, in accordance with the input infomiation. 

12. A program that a computer executes to realize a music data performance process, comprising the instructions of: 
(a) analyzing a voice into hamionic components and inharmonic components: 
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(b) analysing a magnitude spectrum envelope of the hamrionic components into a magnitude spectrum enve- 
lope of a vocal cord vibration wavefomri, resonances and a spectrum envelope of a difference of the magnitude 
spectmm envelope of the harmonic components from a sum of the magnitude spectrum envelope of the vocal 
cord vibration waveform and the resonances: and 
5 (c) storing the inharmonic components, the magnitude spectrum envelopeof the vocal cord vibration wavefomi, 

resonances and the spectrum envelope of the difference. 

13. A program that a computer executes to realize a music data perfonnance process, comprising the Instructions of: 

(Q) reading a magnitude spectmm envelope of a vocal cord vibration wavefomi, resonances and a spectrum 
envelope of a difference of a magnitude spectrum envelope of a hamnonic components from a sum of the 
magnitude spectrum envelope of the vocal cord vibration wavefomri and the resonances, respectively analyzed 
from the harmonic components analyzed from a voice and inharmonic components analyzed from the voice; 

(b) inputting Infomiation of a voice to be synthesized; 

(c) generating a flat magnitude spectrum envelope; and 

(d) adding the inhannonic components, the magnitude spectrum envelope of the vocal cord vibration wave- 
fomri, resonances and the spectrum envelope of the difference, respectively read at said step (a), to the flat 
magnitude spectrum envelope. In accordance with the input infomnatlon. 
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FIG. 1 
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FIG. 3 
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FIG. 7 
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FIG. 9 
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FIG. 11 A 
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FIG. 11 B 
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FIG. 12A 
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FIG. 13A 
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FIG. 14A 
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FIG. 16 
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